Running Parallel Applications on an Mp with Multithreaded Superscalar Processors Running Parallel Applications on a Mp with Multithreaded Superscalar Processors
نویسندگان
چکیده
With lesser returns on adding more complexity to conventional superscalar processors, simultaneous multithreaded (SMT) superscalar processors seem to be a promising alternative. Unfortunately, most previous work has focused on systems running multiprogrammed loads of sequential applications. It is not clear how well these processors work in a shared-memory multiprocessor environment running parallel applications. This paper examines this issue for automatically parallelized and explicitly parallel applications. This paper shows that SMT-based multiprocessors deliver higher performance than multiprocessors with conventional superscalars and are more robust in the presence of low-performing memory systems. Furthermore, while chips with multiple processors called superchips are simpler than chips with an SMT processor, the performance of a superchip-based system running memory intensive parallel applications is much lower than that of a plain SMT-based system. In addition, superchip-based systems are not a good option for workloads with sequential applications. Finally, we propose hardware support to improve the synchronization performance of SMT processors. With this optimization, SMT-based multiprocessors run an average of 14% faster than superchip-based multiprocessors. Overall, SMT-based multiprocessors are the most cost-eeective organization to run parallel applications. Abstract With lesser returns on adding more complexity to conventional superscalar processors, simultaneous multithreaded (SMT) superscalar processors seem to be a promising alternative. Unfortunately , most previous work has focused on systems running multiprogrammed loads of sequential applications. It is not clear how well these processors work in a shared-memory multiprocessor environment running parallel applications. This paper examines this issue for automatically parallelized and explicitly parallel applications. This paper shows that SMT-based multiprocessors deliver higher performance than multi-processors with conventional superscalars and are more robust in the presence of low-performing memory systems. Furthermore, while chips with multiple processors called superchips are simpler than chips with an SMT processor, the performance of a superchip-based system running memory intensive parallel applications is much lower than that of a plain SMT-based system. In addition, superchip-based systems are not a good option for workloads with sequential applications. Finally, we propose hardware support to improve the synchronization performance of SMT processors. With this optimization, SMT-based multiprocessors run an average of 14% faster than superchip-based multiprocessors. Overall, SMT-based multiprocessors are the most cost-eeective organization to run parallel applications.
منابع مشابه
Comparison of features for current commercial multicore
Published by the IEEE Computer Society 0018-9162/10/$26.00 © 2010 IEEE In the past, developers used additional capacity to develop superscalar CPUs with replicated execution units and deep pipelines to exploit instruction-level parallelism. However, they only harvested about 25 percent of the additional chip space that became available per year by adding new architectural features.2 Moreover, t...
متن کاملMultithreaded Processors
The instruction-level parallelism found in a conventional instruction stream is limited. Studies have shown the limits of processor utilization even for today's superscalar microprocessors. One solution is the additional utilization of more coarse-grained parallelism. The main approaches are the (single) chip multiprocessor and the multithreaded processor which optimize the throughput of multip...
متن کاملSimultaneous Multithreading
Current research in processor technology and computer architecture is motivated primarily by the need for greater performance. In this context, it is well understood that the performance gain from improving the memory system alone is limited, and using system Level Integration (such as supporting graphics/sound on chip) can only lead to marginal performance benefits. The most significant gain c...
متن کاملIntegrating Multiple Forms of Multithreaded Execution on SMT Processors: A Quantitative Study with Scientific Workloads
Simultaneous multithreaded (SMT) processors have penetrated the mainstream computing market, since they offer a number of cost / performance advantages over conventional superscalar processors at a nominal additional cost. Simultaneous multithreading can be used in the execution engine of a single monolithic microprocessor, or be embedded and replicated in the execution cores of a chip multipro...
متن کاملDesign and Validation of a Simultaneous Multi-Threaded DLX Processor
| Modern day computer systems rely on two forms of parallelism to achieve high performance, parallelism between individual instructions of a program (ILP) and parallelism between individual threads (TLP). Superscalar processors exploit ILP by issuing several instructions per clock, and multiprocessors (MP) exploit TLP by running di erent threads in parallel on di erent processors. A fundamental...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007